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ABSTRACT:  This  report  summarizes  the  activities  at  Boeing  Computer  Services 
Company  on  AFOSR  Contract  F49620-85-C-0057  from  April  16,  1985  until  August 
15,  1986.  Five  tasks  are  defined  in  our  analysis  of  quotient  tree  algorithms 
and  frontal  methods:  analyses  of  multi-frontal  methods,  creation  of  symmetric 
out-of-core  minimal  storage  sparse  elimination  schemes,  analyses  of  quotient 
tree  orderings,  and  completion  of  the  Harwell-Boeing  sparse  matrix  test 
collection.  Reports  on  the  progress  of  the  work  on  the  five  tasks  are  given, 
relevant  reports  and  publications  of  project  personnel  are  listed  and  related 


INTRODUCTION 


The  solution  of  systems  of  large  sparse  linear  equations  is  a  fundamental 
computational  step  in  the  numerical  solution  of  many  scientific  and 
engineering  problems.  A  key  aspect  of  the  solution  is  the  choice  of  a 
reordering  heuristic,  in  which  the  equations  are  presented  in  a  new  order  that 
reduces  some  measure  of  the  cost  of  the  solution.  Algorithms  for  obtaining 
the  optimal  reordering  are  usually  too  expensive,  therefore  for  all  practical 
purposes  only  reordering  heuristics  are  considered,  which  produce  a  near 
optimal  reordering.  Different  reordering  heuristics  have  arisen  in  many 
disciplines,  reflecting  different  types  of  sparse  matrix  systems  and  different 
approaches  to  cost. 


This  research  project  was  concerned  with  improving  our  understanding  of 
several  different  classes  of  reordering  heuristics.  One  is  the  general 
category  of  frontal  methods ,  a  second  category  are  quoti ent  tree  orderings. 
All  these  heuristic  algorithms  are  in  some  sense  general  purpose,  but  have 
been  developed  for  different  problems  and  with  different  goals  in  mind.  Our 
general  objectives  were  to  understand  the  relationship  of  frontal  methods  to 
general  sparse  methods,  and  to  characterize  and  evaluate  quotient  tree 
methods.  This  report  describes  the  current  status  of  the  project.  The 
following  topics  are  covered  by  this  report:  research  objectives,  status  of 
the  research  effort,  relevant  publications  by  the  project  personnel, 
professional  personnel  associated  with  the  research  effort,  and  related  sparse 
matrix  activities  at  Boeing  Computer  Services. 


RESEARCH  OBJECTIVES 


The  research  objectives 
described  in  Report  No.  1 


were  broken  down  into  five 
from  April  10,  1986,  were: 


tasks.  The  tasks,  as 


Task  1:  Analysis  of  Multi-Frontal  Orderings. 


This  task  includes  an  analysis  of  the  storage  requirements  for  frontal 
factorizations  of  symmetric  indefinite  matrices  and  an  investigation  of 
methods  combining  both  the  advantages  of  frontal  and  standard  elimination 
Our  plan  for  this  analysis  consisted  of  the  following  subtasks: 
formalization  of  proofs  of  equivalence 
analysis  of  space  requirements 

empirical  study  of  the  behavior  for  large  positive  definite  systems 
empirical  evaluation  of  the  behavior  for  large  indefinite  systems 
preparation  of  a  technical  paper  containing  the  results  of  a)  -  d) 
modifications  to  mul ti -frontal  techniques  based  on  the 
subtasks . 


schemes. 

(a) 

(b) 

(c) 

(d) 

(e) 

(f) 


possible 
earl i er 


Task  2:  Creation  of  a  Symmetric  Indefinite  Out-of-Core  General  Sparse  Solver. 


This  activity  is  an  attempt  to  exploit  Liu's  work  on  out-of-core  solution  of 
positive  definite  linear  systems  to  produce  a  generalization  for  symmetric 
indefinite  systems.  The  approach  is  to  use  Liu's  experimental  code  as  a 
prototype,  to  be  modified  into  a  code  that  approaches  the  indefinite  case  in  a 
general ization  of  the  manner  used  in  the  multifrontal  codes.  Our  expectation 
is  that  the  behavior  of  the  new  code  will  mirror  that  of  the  multifrontal 
code,  which  will  confirm  the  explanation  of  the  relationships  between  them. 


The  plan  for  this  work  was  as  follows: 

(a)  conversion  of  Liu's  code  from  York  to  Boeing  Cray 

(b)  removal  of  compression  from  the  subscript  data  structure 

(c)  incorporation  of  2  by  2  pivoting  for  indefiniteness 

(d)  empirical  evaluation  of  the  indefinite  code  on  test  problems  derived 
from  large  structural  eigenvalue  problems 

(e)  preparation  of  a  technical  paper  on  the  results 


Task  3:  Analysis  of  an  Out-of-Core  General  Sparse  Algorithm. 

One  of  the  key  differences  between  multifrontal  and  current  general  sparse 
algorithms  is  the  ordering  of  operations  performed  in  Gaussian  elimination. 
General  sparse  formulations  use  a  GAXPYI  or  inner  product  formulation,  whereas 
frontal  codes  use  a  SAXPY  or  outer  product  formulation.  Another  outer  product 
formulation  using  SAXPYI's  (indexed  SAXPY's)  is  possible  and  is  quite  similar 
in  spirit  to  the  so-called  minimum  storage  sparse  elimination  algorithm.  We 
proposed  to  investigate  possible  out-of-core  outer  product  factorization 
according  to  the  following  subtasks: 

(a)  modify  symbolic  factorization  from  (2b)  to  predict  storage 
requirements  for  the  new  algorithm 

(b)  use  the  experimental  code  from  2  as  the  basis  for  creating  a  code  to 
use  in  testing  the  numerical  behavior  of  the  new  algorithm 

(c)  evaluate  the  new  code  on  the  standard  large  test  problems 

(d)  prepare  technical  paper  on  the  results. 


Task  4:  Analysis  of  Quotient  Tree  Orderings. 

The  evaluation  of  the  refined  quotient  tree  ordering  (RQT)  by  George  and  Liu 
on  some  large  three  dimensional  structural  analysis  problems  has  shown  this 
method  to  be  very  efficient,  in  particular  with  respect  to  storage 

requirements.  Recently  Zmijewski  and  Gilbert  have  published  a  theoretical 

analysis  of  alternative  quotient  tree  algorithm  for  certain  model  problems. 
The  objectives  of  this  task  were  to  analyze  these  and  other  quotient  tree 

algorithms  and  investigate  a  possible  general  characterization  of  quotient 

tree  algorithms,  with  the  possibility  of  generating  new  ordering  heuristics. 
The  planned  subtasks  were: 

(a)  characterize  relationship  of  level  structures  and  quotient  tree 
orderings 

(b)  preliminary  empirical  study  of  quotient  tree  orderings 
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(c)  evaluate  hole  breaking  strategies  on  model  problem 

(d)  evaluate  other  modifications  to  quotient  tree  orderings  on  model 
problems 

(e)  empirical  study  using  modified  SPARSPAK  code  addressing  results  of 
(c)  and  (d) 

(f)  evaluate  tree  structure  as  basis  for  parallel  implementation 

(g)  investigate  relationship  of  out-of-core  quotient  tree  factorization 
algorithm  to  multifrontal  and  general  sparse  work 

(h)  prepare  technical  paper 


Task  5:  Completion  and  Publication  of  the  Harwell-Boeing  Sparse  Matrix  Test 
Collection. 

The  Harwell-Boeing  test  matrix  collection  has  been  a  useful  tool  for  numerous 
researchers.  In  particular,  examples  of  very  large  realistic  problems  have 
been  very  important  in  evaluating  the  real  performance  of  proposed  algorithms. 
A  more  flexible  updating  and  distribution  system  was  needed  for  making  the 
collection  available  to  other  researchers  and  in  preparing  a  formal 
publication  on  the  collection.  A  number  of  very  large  structural  engineering 
examples  from  Boeing  was  planned  to  be  added  to  the  collection.  Dr.  Iain  Duff 
from  AERE  Harwell  is  a  collaborator  on  this  task.  The  subtasks  were: 

(a)  finalizing  current  collection 

(b)  completion  of  distribution  system 

(c)  preparation  of  formal  announcement  (paper). 
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STATUS  OF  THE  RESEARCH  EFFORT 

Significant  progress  was  made  in  all  of  the  five  tasks,  and  Task  5  was 
essentially  completed.  We  anticipate  to  complete  Tasks  1  -  4  in  the  option 
year  of  the  contract.  The  progress  made  so  far  is  described  in  detail  below: 

Task  1.  Tasks  1(a)  through  1(d)  have  been  completed  during  the  first  year  of 
the  project.  The  relationship  between  multifrontal  and  general  sparse 
algorithms  have  been  formalized.  We  have  been  able  to  show  that  given  the 
same  ordering,  the  number  of  operations  performed  by  a  multifrontal 
implementation  of  Gaussian  elimination  is  equivalent  to  the  number  of 
operations  in  a  general  sparse  scheme.  The  in-core  storage  requirements  for 
general  sparse  elimination,  frontal  method,  multifrontal  method  have  been 
analyzed  and  compared  on  a  variety  of  problems  and  with  a  number  of  different 
orderings.  The  orderings  considered  were  reverse  Cuthi 1 1 -McKee,  automated 
nested  dissection,  quotient  minimum  degree,  multiple  minimum  degree,  the  MA27 
minimum  degree  ordering,  and  the  ordering  generated  by  Liu's  out-of-core 
solver.  Our  preliminary  findings  indicate  that  the  multifrontal  scheme 
required  least  in-core  storage  among  the  methods  considered. 


Task  2.  Tasks  2(a)  through  2(c)  have  been  completed.  A  prototype  sparse 
symmetric  indefinite  out-of-core  solver  has  been  implemented.  The 
implementation  is  based  on  Liu's  out-of-core  solver  which  has  been  converted 


to  the  Boeing  CRAY.  This  symmetric  positive  definite  solver  has  been  modified 
to  incorporate  two  by  two  pivoting  for  the  symetric  indefinite  case.  Columns 
of  the  matrix,  which  cannot  be  eliminated  for  stability  reasons  as  indicated 
by  the  Bunch-Kaufmann  test,  are  put  on  a  stack.  Implementing  the  stack  and 
accounting  for  the  additional  fill-in  caused  by  the  interchanges  required  some 
major  recoding.  In  the  current  version  of  the  code  some  details  in  the 
garbage  collection  on  the  stack  require  further  refinement.  After  these 
modifications  have  been  implemented,  we  will  be  able  to  evaluate  the  code  on 
large  indefinite  structures  problems,  and  compare  its  performance  to  the 
multi  frontal  method. 


Task  3.  Tasks  3(a)  through  3(c)  have  been  completed.  Three  variants  of  a  new 
class  methods  called  tree  profile  methods  have  been  implemented.  The  tree 
profile  methods  are  frontal  methods  based  on  the  elimination  tree.  In  version 
1  a  complete  path  from  a  the  leaf  to  be  eliminated  to  the  root  is  kept  in 

core.  Version  2  only  keeps  in  core  the  complete  path  from  the  leaf  to  the 

highest  ancestor  directly  connected  to  the  leaf.  The  third  version  finally 
only  has  in  core  directly  connected  ancestors  of  a  leaf  node.  The  third 
version  obviously  requires  the  minimum  amount  of  in-core  storage.  Its 

implementation  is  not  much  more  complicated  than  the  implementations  of 
versions  1  and  2.  Thus  among  tree  profile  methods,  version  three  appears  to 
be  most  efficient. 

Tree  profile  methods  are  based  on  a  dense  SAXPY  or  outer  product  formulation. 
They  were  compared  to  a  forward  sparse  storage  scheme  using  indexed  SAXPY' s 
(SAXPYI's).  The  forward  sparse  scheme  is  closely  related  to  the  minimum 

storage  Gaussian  elimination  scheme  by  Sherman.  In  numerical  tests  these 
methods  were  evaluated  in  a  similar  setup  as  in  Task  1,  which  allows  a  direct 
comparison  to  multifrontal  and  general  sparse  methods.  It  turns  out  that  tree 
profile  methods  require  somewhat  more  storage  and  increased  operaton  count 
than  forward  sparse  methods.  However,  the  advantage  of  tree  profile  methods 
is  to  perform  dense  SAXPY's  instead  of  sparse  SAXPYI’s.  Hence  tree  profile 
methods  have  a  longer  average  vector  length. 

Task  4.  Tasks  4(a),  (b),  (d),  and  (e)  have  been  completed.  We  have  been  able 
to  relate  quotient  tree  orderings  directly  to  elimination  trees,  and  thus 
developed  a  way  to  generate  quotient  tree  partitioning  from  arbitrary  sparse 
matrix  orderings.  Based  on  a  theoretical  analysis  on  regular  grid  problems, 
quotient  tree  orderings  derived  from  a  minimum  degree  ordering  appeared  to  be 
promising,  and  have  been  implemented.  The  variants  which  have  been 

implemented  include  a  level  structure  rooted  at  the  node  ordered  last  by  the 
minimum  degree  ordering.  This  ordering  generated  generally  wide  quotient 
trees,  which  will  be  of  use  in  the  parallel  implementation  of  RQT  to  be 
studied  in  the  future.  Ov'-nll  our  empirical  studies  indicate  that  none  of 
the  new  RQT  orderings  consider'd  significantly  improved  over  SPARSPAK  RQT  with 
regards  to  storage  and/or  ion  counts.  However,  if  the  results  also 

showed  that  RQT's  perfor. ur  is  comparable  to  and  sometimes  better  than 
mimimum  degree  on  large  str.  ‘.■vs  oroblems. 


Task  5.  Task  5  has  been  completed.  During  the  project  year  several  new 
matrices  have  been  added  to  the  collection,  which  now  contains  about  225 
matrices.  Some  very  large  matrices  have  been  obtained  from  structural 
analysis  models  ranging  up  to  44,609  unknowns.  The  collection  has  been 
extended  to  include  complex  matrices,  and  right  hand  sides  for  iterative 
problems.  The  matrices  now  can  be  accessed  through  a  data  base,  which  allows 
a  user  to  extract  only  matrices  of  a  certain  problem  class  (e.g.  chemical 
engineering)  or  matrix  type  (e.g.  symmetric).  The  availability  of  the 
collection  to  the  general  scientific  community  will  be  announced  in  a  paper, 
which  is  currently  under  preparation  by  Iain  Duff  at  AERE  Harwell. 
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Summer  Institute,  Seattle,  Washington,  August  1985. 

J.G.  Lewis,  "Modern  Ei genextraction  Algorithms  for  Structural  Dynamic 
Analyses,"  ICES  '85  International  Conference,  Toronto,  Ontario,  August  1985. 


J.G.  Lewis,  B.  Nour-Omid  and  H.D.  Simon,  "A  Parallel  Algorithm  for  the 
Symmetric  Tridiagonal  Eigenvalue  Problem,"  Second  SIAM  Conference  on  Parallel 
Processing  for  Scientific  Computing,  Norfolk,  Virginia,  November  1985. 

J.G.  Lewis  and  H.D.  Simon,  "The  Impact  of  Hardware  Gather/Scatter  on  Sparse 
Gaussian  Elimination",  12th  International  Conference  on  Parallel  Processing, 
St.  Charles,  Illinois,  August  1986. 

H.D.  Simon,  "Supercomputer  Vectorization  and  Optimization,"  Annual  Meeting  of 
the  American  Mathematical  Society,  Anaheim,  California,  January  1985. 

H.D.  Simon,  "Incomplete  LU  Preconditioners  for  Conjugate-Gradient-Type 

Iterative  Methods,"  Eighth  SPE  Symposium  on  Reservoir  Simulation,  Dallas, 
Texas,  February  1985. 

H.D.  Simon,  R.G.  Grimes,  J.G.  Lewis,  L.  Komzsik,  and  D.  Scott  "Shifted  Block 
Lanczos  Algorithm  in  MSC/NASTRAN , "  MSC/NASTRAN  User's  Conference,  Los  Angeles, 
California,  March  1985. 

H.D.  Simon,  "Supercomputers  -Experience  and  the  Future,"  12th  Annual  ACM 
SIGUCCPS  Computer  Center  Management  Symposium,  St.  Louis,  Missouri,  March 
1985. 

H.D.  Simon,  "Principles  of  Vectorization,"  BCS/NSF  Supercomputer  Summer 

Institute,  Seattle,  Washington,  August  1985. 

H.D.  Simon,  "Memory  and  Memory  Access  on  the  Cray  X-MP,"  BCS/NSF  Supercomputer 
Summer  Institute,  Seattle,  Washington,  August  1985. 

H.D.  Simon,  "Software  and  Algorithms  for  the  Iterative  Solution  of  Sparse 

Linear  Systems  on  Supercomputers,"  BCS/NSF  Supercomputer  Summer  Institute, 
Seattle,  Washington,  August  1985. 

H.D.  Simon,  "Computational  Kernels,"  First  R  IMS  I G  Meeting,  Lawrence  Livermore, 
California,  October  1985. 

H.D.  Simon,  "Approximate  Inverses:  A  Family  of  Naturally  Vectorizing 

Preconditioners,"  Second  SIAM  Conference  on  Parallel  Processing  for  Scientific 
Computing,  Norfolk,  Virginia,  November  1985. 


RECENT  ABSTRACTS  PREPARED  FOR  PRESENTATION  AT  PROFESSIONAL  MEETINGS 

The  following  abstracts  are  preliminary  reports  of  work  in  progress. 

R.G.  Grimes,  "Experiences  in  Solving  Large  Eigenvalue  Problems  on  the  Cray  X- 
MP",  Cray  Users  Group  Meetin_,,  Garmisch  Partenki rchen ,  West  Germany,  September 
1986. 

R.G.  Grimes,  J.G.  Lewis,  Simon,  "A  Shifting  Strategy  for  the  Lanczos 
Algorithm",  First  Interna*:’ .nil  Congress  on  Computer  Methods  in  Mechanical 
Engineering,  Austin,  TX,  Septo-t or  1986. 
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Four  researchers  from  Boeing  Computer  Services,  C.  Cleveland  Ashcraft,  Roger 
G.  Grimes,  John  G.  Lewis,  and  Horst  D.  Simon  performed  most  of  the  work 
during  the  first  year  of  this  contract.  John  G.  Lewis  acted  as  Project 
Manager.  Task  5,  which  is  concerned  with  the  completion  of  the  test  matrix 
collection,  is  being  carried  out  in  collaboration  with  Dr.  Iain  S.  Duff  of 
AERE,  Harwell,  England. 


RELATED  SPARSE  MATRIX  ACTIVITIES  AT  BOEING  COMPUTER  SERVICES  COMPANY 

The  mathematicians  at  Boeing  Computer  Services  working  on  this  project  also 
are  active  in  other  projects  which  involve  sparse  matrix  computations.  This 
section  briefly  describes  some  of  the  most  recent  activities  by  those  people. 
These  projects  are  not  funded  by  the  AFOSR  contract  but  they  indicate  the 
significant  role  that  sparse  matrix  research  plays  at  BCS. 

Sparse  Vector  and  Matrix  Building  Blocks  for  the  CRAY  X-MP.  All  four  of  the 
mathematicians  on  this  project  have  been  involved  in  developing,  implementing 
and  testing  assembler  language  basic  building  blocks  for  sparse  vector  and 
matrix  computations  on  the  CRAY  X-MP  computer.  These  subprograms  are  a  part 
of  VectorPak.  R.G.  Grimes  is  the  project  lead  for  this  Boeing  commercial 
activity. 

CRAY  X-MP  Optimization  of  SPARSPAK  and  COMPLEX  version  of  SPARSPAK.  J.  G. 
Lewis  and  R.  G.  Grimes  have  modified  SPARSPAK  so  that  a  version  optimized  for 
the  CRAY  X-MP  and  a  COMPLEX  version  have  been  produced.  Condition  number 

estimator  and  stability  monitoring  have  been  added. 

Out-of-Core  Nested  Disection  Code.  J.G.  Lewis  continues  the  work  on  the 
solution  of  huge  systems  of  linear  equations  by  nested  disection  algorithms. 
The  linear  systems  have  as  many  as  one  million  equations  and  are  derived  from 
special  finite  element  models  with  a  very  regular  structure.  This  work  is 
undertaken  for  a  major  commercial  customer  and  represents  a  significant 
industrial  production  program. 

Sparse  Ei qenanalysis  for  Structural  Engineering,  R.G.  Grimes,  J.G.  Lewis,  and 
H.D.  Simon  have  implemented  a  block  shifted  and  inverted  Lanczos  algorithm 
into  the  commercial  structural  engineering  analysis  package  NASTRAN  for  the 

MacNei 1 1 -Schwendler  corporation.  This  represents  the  state  of  the  art  for 
sparse  matrix  eigenanalysis.  It  has  particular  relevance  to  the  present 
contract  in  that  the  sparse  linear  systems  required  are  symmetric  with 
indefinite  coefficient  matrix.  Task  2  of  this  project  will  have  direct  impact 
on  this  application. 

This  project  also  included  the  incorporation  of  the  same  analysis  techniques 
into  the  Boeing  ATLAS  structures  program,  and  into  the  Boeing  version  of 

Georgia  Tech’s  STRUDL  program.  These  three  packages,  particularly  NASTRAN, 
provide  a  rich  source  of  useful  test  problems. 

Iterative  Methods  and  Precondi toners  on  Vector  and  Parallel  Computers.  C . 

Ashcraft,  R.  Grimes,  and  H.D.  Simon  are  involved  in  the  creation  of  a  test  bed 
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for  iterative  methods  and  preconditoners.  The  fully  developed  software  will 
allow  a  user  to  investigate  the  efficiency  of  a  particular 
preconditioner/iterative  method  combination  for  a  particular  application.  C. 
Ashcraft  has  developed  a  variety  of  new  precondiotning  techniques,  which  extend 
his  previous  work  on  computational  front  techniques  for  regular  grids. 
Computational  front  techniques  can  be  implemented  in  a  dataflow  like  manner  on 
parallel  machines.  There  has  been  a  frequent  cross  germination  of  ideas 
between  this  internally  funded  research  program  and  the  current  contract. 

Analysis  of  Parallel  Architectures.  C.  Ashcraft  has  recently  implemented 
several  industrial  codes  on  hypercube  computers.  H.  Simon  is  project  lead  for 
a  Boeing  project  investigating  current  commercially  available  advanced 
architecture  computers.  R.  Grimes  is  responsible  for  a  benchmark  activity 
involving  these  machines.  C.  Ashcraft  and  J.  G.  Lewis  will  participate  in 
parallel  architecture  project,  whose  goals  are  to  provide  a  focal  point 
within  Boeing  for  knowledge  about  parallel  computing.  J.  G.  Lewis  will  spend  a 
sabbatical  year  in  1986/87  at  MIT  and  Yale  to  study  the  potential  applications 
of  parallel  architectures  to  scientific  computing. 

Software  Tools  for  Supercomputers.  H.D.  Simon  is  Project  Manager  of  this  work 
funded  by  the  Office  of  Advanced  Scientific  Computing  at  the  NSF.  The  goal  is 
to  investigate  tools  for  porting  application  programs  to  a  parallel 
environment. 


Large  Dense  Eiqenanal.ysis.  This  project  is  part  of  the  above  mentioned  NSF 
contract.  FT  Simon  will  extend  the  Lanczos  work  to  solve  dense  generalized 
eigenvalue  problems  arising  in  quantum  mechanical  calculations.  Particularly 
emphasis  will  be  placed  on  an  efficient  out-of-core  implementation  of  these 
algorithms. 

Sparse  Matrix  Methods  for  Computational  Fluid  Dynamics.  R.  Grimes  and  H. 
Simon  are  involved  in  this  joint  project  with  the  computational  fluid  dynamics 
research  group  at  Boeing.  The  current  approach  of  the  CFD  group  requires  the 
solution  of  linear  systems  with  up  to  one  million  unknowns.  A  precondi toned 
iterative  method,  with  the  preconditioner  based  on  a  reduced  system  then  still 
requires  the  exact  solution  of  problems  of  about  30,000  equations  using  direct 
methods. 


